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Abstract 

This is an exposition of the combinatorial proof of the density Hales- 
Jewett theorem, due to D. H. J. Polymath in 2012. The theorem says that 
for given S > and k, for every n > no every set A C {1, 2, . . . , k} n with 
\A\ > 5k n contains a combinatorial line. It implies Szemeredi's theorem, 
which claims that for given 5 > and k, for every n > no every set 
A C {1, 2, . . . , n} with \ A\ > Sn contains a fc-term arithmetic progression. 

1 Introduction 

The purpose of this text is to familiarize the author, and possibly the interested 
reader, with the recent remarkable elementary proof [20J [19] of Polymath (a 
group of mathematicians, see Nielsen [T5] and Gowers [3] for more information) 
for the density Hales-Jewett theorem, one of the deepest results in extremal 
combinatorics/Ramsey theory, which has as an easy corollary the famous the- 
orem of Szemeredi, indeed the multidimensional generalization thereof. The 
author hopes to use it in his future book on number theory; other similar on- 
line available fragments are [HI [TU [16] . We begin with recalling the mentioned 
theorems and introducing some notation. Further notation, concepts and aux- 
iliary results will be introduced along the way. 

We denote N = {1, 2, . . . }, N = {0, 1, . . . } and, for n G N, [n] = {1,2,..., n}. 
For finite sets B ^% and A, we call the ratio of cardinalities ^4gp G [0, 1] the 
density of A in B and write /x#(A) for it; when B is understood from the con- 
text, we write just n(A) and speak of density of A. Later we consider more 
general densities. Densities and the quantities bounding them are denoted by 
the Greek letters /i, S, e, 7, v, rj, 9, f3 and are real numbers from the interval 
[0, 1]. A partition of a set A is an expression of A as a disjoint union of possi- 
bly empty sets. Note that if B — \J ieI Bi is a partition and Hb(A) > S, then 
VBii-A) > $ f° r some i. For a, d, k G N, the fc-element set 

{a, a + tf, a + 2d, . . . , a + (k — l)cf} 
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is the k-term arithmetic progression. The following is the famous Szemeredi's 
theorem [22] . 

Theorem 1 For every S > and k g N, there exists an no £ N swc/i i/iai /or 
every n > uq every set A C [n] wii/i /u(A) > (5 contains a k-term arithmetic 
progression. 

Precursor of Szemeredi's theorem was its color version, the van der Waerden 
theorem [26] asserting that for every r, k £ N, for any n > no in any partition 
[n] = A\ U ^2 U ■ ■ ■ U A r a block A contains a fc-term arithmetic progression. 
Clearly, Szemeredi's theorem implies van der Waerden theorem. 

For k,n £ N, the set [k] n consists of all k n n-tuples, called words, x = 
(xi,X2, . . . ,x n ) with Xi £ [k]. For x £ [k + l]"\[fe] n and i £ [k], we denote by 
x(i) the word obtained from x by replacing each occurrence of k + 1 by i. The 
fc-element subset of [k] n of these words, 

L(x) = {x(i) | % £ [k]}, 

is the combinatorial line (determined by x). In 1963 Hales and Jewett [10) 
proved that for every r, k £ N, for any n > no in any partition [k] n — A±U A2U 
■ ■ ■ U A r a block A; contains a combinatorial line. The stronger density version 
of this theorem was achieved by Furstenberg and Katznelson in 1991 [7] (they 
proved the special case k = 3 earlier in [6]) by ergodic methods, developed by 
Furstenberg [5] in his proof of Szemeredi's theorem. Thus, the density Hales- 
Jewett theorem asserts the following. 

Theorem 2 For every S > and k £ N, there exists an no £ N such that for 
every n > hq every set Ad [k] n with [i{A) > 5 contains a combinatorial line. 

We shall prove Theorem [2l following Polymath's proof in 20 . Theorem [2] 
implies Theorem[TJ with the same k, by means of the bijcction 

n 

f : [k] n -> [H, /(*) = f((xi,x 2 , . . . , x n )) = 1 + - I)**" 1 

i=l 

which sends combinatorial lines to /c-term arithmetic progressions and, being 
bijection, preserves densities; for the color versions of the theorems the simpler 
mapping x 1— > x\ + X2 + ■ ■ ■ + x n suffices for the reduction. 

Multidimensional Szemeredi's theorem claims that for every S > 0, r £ N and 
finite set H C W, there exists an no £ N such that for every n > uq every set 
A C [n] r with n(A) > S contains a copy of H of the form a + dH, a £ W, d £ N. 
The particular case with r — 2 and H = {(1, 1), (1, 2), (2, 1)} is the corner 
theorem which was derived by Ajtai and Szemeredi pQ from Szemeredi's theorem. 
As explained in [SU] and [5] , the proof of Theorem [5] in [50] is inspired by and 
modelled after the increment density argument in pQ. 
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2 The proof of Theorem [2] 



The combinatorial subspace S of [fc]™ with dimension d, d < n, determined by 
the word x e [fc + d] n such that each letter fc + 1, fc + 2, . . .,k + d appears in x 
at least once, is the fc d -element subset of [k] n 

S = S(x) = {x(y) | y e [k] d } 

where x(y) is the word obtained from x by replacing each occurrence of k + % 
by Vi, i = 1, 2, . . . , d. In other words, S C [k] n is a d-dimensional combinatorial 
subspace of [fc]™ if and only if there exist a word z € [fc]™ and d nonempty and 
disjoint subsets Xi C [n], 1 < I < d, such that 

x G 5 Xj = if i € [n]\(Xi U • • • U X^) and xi = xj if i, j € X;. 

The elements of [n] in the union X\ U • ■ ■ U Xd are the free coordinates of S and 
those not in it are the fixed coordinates of S. The 1-dimensional combinatorial 
subspaces are exactly combinatorial lines. From now we omit for brevity 'com- 
binatorial' for subspaces and lines. The words [fc + d] n \ Ui=fc+i([& + ^]\W) n 
and ci-dimensional subspaces of [fc]™ correspond via the mapping x H> S(x), and 
this is a d!-to-one correspondence as S(x) = S(x') iff x and x' can be identified 
by permuting the letters fc + l,fc + 2,...,fc + d. The set of words [k] d and any 
d-dimensional subspace S(x) C [fc]™ are in bijection via y h-> x(y). This bijection 
sends the e-dimensional subspaces of [k] d , e < d, to the e-dimensional subspaces 
of [fc]™ contained in S(x), and this is in fact a bijection. 

We capture the density increment argument by the next proposition. 

Proposition 3 There is a function 

c = c(fc,<5) : Nx (0, (0,1), 

nondecreasing in S for every fc, such that for every k, d G N and S € (0, 1), there 
is an no such that for every n > n$ and every set A C [fc] n with (J-(A) > S and 
containing no line, there exists a subspace S C [fc]™ with dimension d and 

Hs(A) > n(A) + c> 5 + c. 

We fix fc > 2 and derive Theorem [5] from Proposition [31 Suppose 5 > is given 
and let c = c(k,6) > 0. By Proposition [31 for d = 1 there is an no such that if 
n > no, A C [fc]™ has fi(A) > 6 and avoids lines, then we get (by the bijection 
between S and [k] d ) a set A' C [k] d = [fc] that has fi(A') > 5 + c. For d = n a + 1 
we have the conclusion for every n > n\ for some n\ and get A 1 c [fc] d = [fc]™ 0+1 
free of lines and with n(A') > 5 + c. We apply to A' C [fc]™ 0+1 Proposition [3] 
again and get A" C [fc] with (J,(A") > (S + c) +c = <5 + 2c. We iterate the process 
and define inductively in a clear way numbers 71%, n^, . . . , n t where t = [1/cJ . 
For n > nt, every set Ac [fc]™ with fJ,(A) > 5 contains a line, for else repeated 
applications of Proposition [3] produce at the end a subset of [fc] with density at 
least 6 + (t + l)c > 1, which cannot exist. 
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The density increment c of Proposition [3] arises in two steps, embodied in 
the next two propositions. For k > 2 and i E [k ~ 1], a set D C [k] n is called zfc- 
insensitive if a; € -D =>■ x' E -D for any word x' obtained from a; by changing some 
occurrences of k to i or vice versa. If D C 5* C [k] n , where S is a ci-dimensional 
subspace, we say that D is ik-insensitive in S if D' C is ik- insensitive where 
D' is the image of D in the bijection between S and [k) d . A set Z? C [k] n is a 
fc-sei if D = rL=i where each £>i C [fc]™ is i/c-insensitive. We define fc-sets 
D C S in a d-dimensional subspace S" C [/c]" similarly, via the bijection between 
5* and [k] d . In the next two propositions we may assume that k > 3 since they 
will be used for such k. 

Proposition 4 Let k,d EN and e > 0. There exists an no such that for every 
n > no, every k-set D C [k] n has a partition 

D = Si U • • • U S t U F 

into d- dimensional subspaces Si C [k] n and a set F C [k] n with /j,(F) < e. 

Proposition 5 There is a function 

7 = 7(M) : Nx (0,1) -> (0,1), 

nondecreasing in 8 for every k, such that for every k,r G N and S G (0, 1), there 
is an no such that for every n > no and every set A C [k] n with fi(A) > S and 
containing no line, there exist an r- dimensional subspace W C [k] n and a k-set 
D C W in W satisfying 

Hw{D)>l and fJ, D (A) > fj,(A) + 7 > 8 + 7. 

We fix k > 2 and derive Proposition |3] from Propositions [5] and 01 Let d and 5 
be given. We set 7 = j(k, 8) > and take the no of Proposition!!] corresponding 
to k, d and e — 7 2 /2. Then we take n\ such that for n > ni Proposition [5] holds 
with k, r = no + 1 and <5. Now let n > ni and suppose a set A C [fc]" with 
jtx(A) > <5 and free of lines is given. There exist a subspace W and a fe-set D C W 
of Proposition[5]such that nw{D) > 7, Hd(A) > //(A) +7 and has dimension 
no + 1. Thus D partitions as in Proposition 01 with [fc] n ° +1 corresponding to W 
in place of the [k} n in Proposition[3]and e = 7 2 /2. Let £> = EUF be a partition 
where E is a disjoint union of d-dimensional subspaces of W and fj,w(F) < e. 
Since fi D {A) > fi(A)+j, ^d{F) = fj, w (F)/fj, w (D) < e/7 = 7/2 and fi F {A) < 1, 
we get he (A) > /i(A) +7/2. By averaging, there is a <i-dimensional subspace 5 
of (contained in E) with /is (A) > /t(A) +7/2. Proposition [3] follows, with 
c(A,*)=7(fc,«)/2. 

Thus to prove Theorem [2] it suffices to deduce Propositions [5] and |U We 
shall proceed by induction on k. We start by proving Theorem [2] for k = 2 and 
then for every k > 3 derive Propositions [S] and 0] from validity of Theorem [2] 
for fe — 1. The derivations rely on formally stronger but equivalent forms of 
Theorem [21 Propositions [7] and [TU We get the implications 

T2 2 => P4 3 & P5 3 => P3 3 ^ T2 3 P4 4 & P5 4 =*► P3 4 ^ T2 4 . . . , 
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which establish Theorem [5] for every k > 2. We start with the easy case k = 2 
and then prepare some results for the derivation of Propositions [5] and |4j 

The words of [2]™ 1-1 correspond to the subsets X C [n], and the lines 1-1 
correspond to the inclusion pairs: pairs X C Y C [n] with F / I. Thus 
Theorem [5] for k = 2 follows from the next classical Sperner's theorem |21j . 

Proposition 6 If F is a family of subsets of [n] containing no inclusion pair 
(i.e., F is an antichain to c) then 

, , f n \ 2 n 

Proof. Let F be an antichain of subsets of [n\. The maximal chains {X C 
X\ C .. . C X n = [n]}, = i, 1-1 correspond to the n! permutations 7r of [n] 
via 7T CV = {0,{7r(l)},{7r(l),7r(2)},...,7r([n]) = [n]}. We double count the 
pairs (7r, X) such that X G F. Grouping the pairs by it we get that their 
number is < n! as \F n CVI < 1 for each 7T. Grouping them by X we get that 
their number is exactly X^xg-F I^IK 71 — l-^Dv since the summand equals to the 
number of 7r with X G CV. Hence 

£ |X|!(n-|X|)!<n!. 

Since (;) < ( L j; 2J ) for any < j < n, [n/2\\(n - [n/2\)l < \X\l(n - |X|!) and 
dividing by [n/2\ !(n — / 2J ) ! yields the stated inequality. □ 

One may generalize Theorem [2] to subspaces but this is not really stronger 
than the original theorem. 

Proposition 7 let k G N, k > 2, be given. Assuming Theorem [H for k, it 
follows that for every 6 > and d G N, there exists an no £ N such that 
for every n > uq every set A C [k] n with fi(A) > 8 contains a d-dimensional 
subspace. 

Proof. We proceed by induction on d where the case d = 1 is Theorem [2l Now 
suppose that d > 2 and the result holds for d — 1 (and every 5). Observe that 
if n = n\ + n^, rii e N, and A C [fc] n with n(A) > 5, then 

> 5/2 for Ai = {xe [fcp | G [fc]" 2 | (x,y) G A}) > 5/2} 

(interpreting (a;, y) in the obvious way as an element of [k] n ). Let S > be 
given. We take an n-i such that the result holds (with n = rii) for d — 1 and 
density 5/2 and then take an n\ such that the conclusion of Theorem [2] holds 
for every n > n\, with density 8 /2{k + d — l)™ 2 . Suppose that n > n± + and 
A C [fc]™ has > 8. Then, using the observation, inductive assumption and 
pigeonhole principle, we get a set A\ C [fc]™ - " 2 with n(A\) > 8 /2{k+d—l) U2 and 
a (d — l)-dimensional subspace S C [fc]" 2 such that (x, y) E A for every x E A\ 
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and y & S. By Theorem [21 A\ contains a line L. Hence {(x, y) \ x G L, y G 5} 
is the desired d-dimensional subspace contained in A. □ 

We will use the fact that almost all words in [k] n have almost precisely n/k 
occurrences of each of the k letters. 

Proposition 8 Let k, n G N, j G [fc] and A C [fc] n &e the set of words with the 
number of occurrences of j outside the interval [n/k — n 2 / 3 , n/k + n 2 / 3 ]. TTien 
11(A) < 71- 1 / 3 . 

Proof. (We say more on the tools used in Subsection 12.21 ) For i G [n] and 
£ G [fc]™, let = 1 if sfy = j and fi(x) = else. Then the function 

/ = /i + ■ • • + fn counts occurrences of j in x, has mean P = n/k (sum of 
the means of the fi) and variance V = (n/k)(l — l/k) < n (V is the mean 
of f 2 minus the square of the mean of /, which by linearity of means and 
independence of the fi gives n/k+n(n— l)/fc 2 — (n/k) 2 ). By Cebysev's inequality, 
fi({x G [k] n | \ f(x) -P\> XVV}) < \- 2 for any A > 0. Setting A = n 1 / 6 gives 
the result. □ 

For k, n G N, we have on [k] n the uniform density /i, given by fi({x}) — l/k n . 
For k > 2 and the parameter m < n, we define another, non-uniform, density 
Mm on [k] n by 

, (S x ^ _ \{(,J,V,z)£M \ (J,y,z) = x}\ 

V,nki X \) - | M | 

for 

M = {(J,y,z) | JC N,|Jhm,|/e [fc-l] J ,zG 

(A B denotes, for sets ^4 and B, the set of all mappings from B to A), where any 
triple (J, y, z) in M is interpreted as x G [fc]™ by setting x% = yi if i G J and 
a;, = Zi else. (We say more about densities in Subsection I2.2H 

Proposition 9 // r\ G (0, 1) and k,m,n G N satisfy k > 2, n > (12k/r/) 12 and 
m < n 1 / 4 , i/ien /or every set A C [fe]" we /iaue 

Proof. This is a particular case of the more general Proposition 1151 which we 
prove later. □ 

To deduce Proposition 01 we need Propositions [7] and [9] 
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2.1 Derivation of Proposition [4] 

In this subsection wefixa£;eN,/c>3, assume that Theorem [5] holds for k — 1 
(and every S > 0) and deduce from this Proposition 2] for k. The main step is 
to get the required partition if D is an ifc-insensitive set for an i £ [k — 1] ; the 
full proposition follows inductively by iteration. We may of course set i = 1. 

Proposition 10 For every d € N and e > 0, there exists an no such that for 
every n > no every lk-insensitive set D C [k] n has a partition 

D = Si U • • • U S t U F 

into d- dimensional subspaces Si C [k] n and a set F C [k] n with u(F) < e. 

Proof. Let del and e > be given. Applying Theorem [2] for k — 1 and 
Proposition [3 we take an m € N, m > d, such that every set 4 c [i- l] m with 
/Lt(A) > e/3 contains a d-dimensional subspace. We set 

n = \3e- 1 mk m - d {k + d) m + m 4 + (36fc/e) 12 l . 

Let n > no and D C [k] n be a lfc-insensitive set. We may assume that u(D) > e 
for else we set at once F = D. We construct, for r = 0, 1, ... , sets D = D D 
D\ D ■ ■ ■ D D r and = Jo C J\ C . . . C J r C [n], | Jj\ = jm, with the properties 
that (i) for each x € [k] Jr , the set 

(D r ) x = {ye [k} [n] \ Jr | (x : y)eD r } 

is lfc-insensitive, (ii) D\D r partitions into d-dimensional subspaces and (hi) 
(i(Dj\D j+1 ) > ek d - m (k + d)- m /Z for j = 0, 1, . . . ,r - 1. Such sets trivially 
exist if r = 0, namely D r = D and J r = 0. We claim that as long as n{D r ) > 
e, the construction can be continued. This establishes the proposition: since 
r < 3e -1 fc™ l ~ d (fc + d) m (by (hi)), the construction has to terminate for some 
r (by the definition of no, n is so large that without terminating we hit the 
contradiction fJ-(D) > 1), and then fi(D r ) < e and D\D r partitions into d- 
dimensional subspaces. 

To prove the claim we assume that u(D r ) > e, which is true if r = 0. In the 
initial step when r = and J r = 0, we modify the following construction, which 
is described for the general step, accordingly by omitting the ^-coordinate. The 
(uniform) average of the values u((D r ) x ), taken over all x € [k] Jr , equals fJ,(D r ) 
and so is at least e. Hence the same average of n' m ((D r ) x ), where u' m is the 
(non- uniform) density on [fc][™l\ J '' introduced before Proposition [3J is at least 
e — n = 2e/3, due to Proposition [9] with n = e/3. In other words, density of 
the subset of the quadruples (x,J,y,z), where x £ [k] Jr , J C [n]\J r , |J| = m, 
y G [k — 1] J and z € [fc]N\( J '- UJ ), satisfying (x,J,y,z) <E D r , in the set of all 
quadruples, is at least 2e/3. Hence there is a J such that density of the triples 
(x,y, z) (from the stated domains) with (x,J,y,z) € D r is at least 2e/3. And 
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for this fixed J, density of the pairs (x, z) for which fi({y E [k — 1] 1 \ (x, J, y, z) E 
A}) > £/3 is at least e/3. We set 

Jr+l = Jr U J. 

By the choice of ra, each of these sets of words y contains a d-dimensional 
subspace U' x z C [k — 1] J . Since (D r ) x is lfc-insensitive, for this J and for each 
of these pairs (x,z) there is a d-dimensional subspace U XyZ C [k] J such that 
(x, y, z) € A for every j/ € f/ x ,z- By the pigeon-hole principle, there is a single 
(i-dimensional subspace U C [fc]' 7 such that the set 

T = {(x,z) E [fc] J - x [fc]NVr+i | (x j2 /,z) e A- for every y E U} 

has density at least s/3(k + d) m . Note that for each x the set of z with (x, z) E T 
is lfc-insensitive. We set 

A+i = D r \(T x [/) 

where T x U means the words in [k] n that restrict, for some (x, z) E T and 
y E U, on J r to x, on [n]\J r +i to z and on J to y. Clearly, (ii) holds because 
T x U = D r \D r+ \ is a disjoint union of e?-dimensional subspaces. Property (i) 
holds too because for every x E [k] Jr and y E [fc] J , the set of z E [k]^\ Jr+1 
with (x,y,z) E D r+ \ is lfc-insensitive. (Consider u = (x,y,z) E D r+ \ and 
u' = (x, y, z') in which z' arises from z by some exchanges of Is and ks. Then 
u' E D r by the l/c-insensitivity of (D r ) x . If u' E T x U then y E U and 
(x, z 1 ) E T, hence (x, z) E T as noted above, and u E T x U, which is not the 
case. So u' ^ T x U and u' E D r+ \.) Finally, the density of T x U in [k] n equals 
to the density of T in [k] Jr x times the density of U in [k] J , which is 

at least e/(3(fc -I- d) rn k m ~ d ). Thus J r +i and D r+ i have the required properties 
(i)-(iii). □ 

Remark. The decrease of density of T x U compared to T, caused by density 
of U, seems to be overlooked by Polymath — they claim [501 bottom of p. 1320] 
that T x U has density at least r](k + d)~ m (i.e., e/3(fc + d) m in our notation), 
which reflects in the statement of [20j Lemma 8.1]. 

We prove Proposition |4] We proceed by induction on the size of intersection 
defining D. Let j E [k— 1]. We assume that for every d and e > Proposition!!] 
holds for all sets of the form D = Dy n D2 fl • • • fl Dj where A: C [k] n is ifc- 
insensitive; for j = 1 this is true by Proposition 1101 From this we deduce (if 
j < k— 1) that Proposition!!] holds for all sets D corresponding to the increased 
parameter value j + 1. For j = k — 1 we get the original Proposition!!] 

So let (i and e > be given. We take no such that for every n > n 
our inductive assumption (for j < k — 1) holds for subspaces dimension d and 
bound on the density of the residual set e/2. Then we take ri\ such that for 
every n > n\ the conclusion of Proposition [TUJ holds for subspaces dimension 
no + 1 and bound on the density of the residual set e/2. Now suppose that 
n > rii and D — D\ fl • • • fl Dj+i where A C [k] n is ifc-insensitive. Using 
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Proposition [TU1 we obtain a partition Dj+i = T\ U • • • U T s U F such that the Tj 
are (no + l)-dimensional subspaces and fi(F) < e/2. We have 

j+l a 

D = p| a = IJ (Ti n A n • • • n Dj) u (F n £>i n • • • n A). 

i=l j=l 

Clearly, each Tj n is /i/c-insensitive in Tj and thus using the inductive as- 
sumption for j we can express (working in [fc]™ 0+1 via the bijection with Ti) 
each set Tj n -Di n • • • n -Dj = (^ n Di) n • • • n (T s H Dj) as a disjoint union of 
d-dimensional subspaces (in Tj and thus in [k] n ) and a residual set F{ with 
[i Ti (Fi) < e/2. These subspaces, taken for all i = l,2,...,s, and the set 
E = Fx U • • • U F s U (F l~l Di n • • • n £>j) form the desired partition of L> because 
/"(-E) < /"(-Pi U • • • U F s ) + fi(F) < max, fi Ti (F,) + e/2 < e. This concludes the 
derivation of Proposition |4j 

2.2 The equal-slices densities v and z> 

We move to the second and more complicated half of the proof of Theorem [21 
the derivation of Proposition [5] for k from Theorem [2] for k — 1. Similarly to the 
role of Proposition [7] in the first half of the proof, we need a stronger version of 
Theorem [2 Proposition [T4l which says that any positively dense set Ac [k] n 
contains, for large n, a set of lines with positive density. However, Proposition^] 
shows that this cannot hold for the uniform density. Consider the set A c [2]" 
of words in which the numbers of occurrences of 1 and 2 deviate from n/2 by 
less than n 2 / 3 . Then (i(A) — !• 1 as n — > oo but at the same time fJ-{M) — s- 
for the set M C [3]™ of lines contained in A, because the inclusion L(x) C A, 
x € [3} n , forces x to have at most 2n 2 / 3 occurrences of 3, and such x have in 
[3]" density going to 0. Fortunately, the strengthening holds for a non-uniform 
density, the equal-slices density v that we define in a moment, and one can go 
from the uniform to equal-slices density and back. Since v does not behave 
well to restrictions to subspaces, we need to work also with a variant density v 
fixing this problem, which for large n differs from v only little. We begin with 
discussing densities in general and then introduce the densities v and v. 

A density on a finite set B ^ is a mapping /x' from the set of all subsets 
of B to the interval [0, 1], such that fi'(B) = 1 and fi'(A U A') = n'(A) + n'{A') 
whenever A, A' C B and A n A' — 0. Thus //(0) = and y! is uniquely 
determined by its values on singletons. Any choice of values //({&}) > 0, x e B, 
with J^xeB M'({ x }) = 1 § ives a density: fx'(A) = J2x&A J^'OW) for an y Ac B. 
We have been using the uniform density fi, defined by /x({x}) = 1/|B| for any 
x e B, and before Proposition [9] we met the non- uniform density /i' m . We 
reserve the letter (i for the uniform density and primed \j! for general, possibly 
non-uniform, density. 

Suppose B is a finite set with a density //. If / : B — > R, the average, or 
mean, of the function f ( with respect to fi' ) is 

£/(z)V({z}). 

xeB 
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We recall a few useful properties of averages, which we already used in the 
proof of Proposition [8] Linearity: if /; : B — > R, i = 1,2, have means P^ and 
a, 6 G R, then a/i + 6/2 has mean aPi + &P2. If /1 and /2 are independent, 
which means that /i'(/ 1 _1 (c) n f^ 1 ^)) = l^'ifx ( c )) • ^'{f^ 1 ^)) for any two 
values c, d G R, then the mean of /1/2 equals to P1P2. If / has average at least 
(at most) c then f(x) > c (/(x) < c) for some x G P. Markov's inequality: 
If / > has mean P and A > 0, then //({a; G P | f(x) > AP}) < A -1 . 
Applying it to the function (f(x) — P) 2 we get Cebysev's inequality: If V, the 
variance of f, is the mean of (/(x) — P) 2 (where P is the mean of /), then 
fi'({x G B I \f(x) - P| > A/V 7 }) < A" 2 for any A > 0. We do not need any 
stronger result on concentration of / around its mean. 

If / : C — > B is a mapping and /1' a density on C, we get a density //' on 
P by setting 

V"({x})=(x'(f-\x))= 

c6C,/(c)=a 

We refer to this as projection. Another construction of more complicated den- 
sities from simpler ones takes a family of sets Bi,i £ I (all of them finite), with 
a density p! x on / and densities fx" on the sets Pi, and defines 

6)}) = ' ^'({6}), i G J, 6 G P,, 

Then // is a density on the disjoint union [j i&I Bi, the set of all pairs (i, b) with 
i E I and b G P^. We call this construction, which generalizes to triples etc., 
higher- dimensional density. Both constructions can be combined: to define a 
non- uniform density on P, one takes a higher-dimensional density, often patched 
from uniform densities, and projects it to P. 

Let us describe one such situation that we already encountered in the proof 
of Proposition [10] and will encounter again. Suppose that fx' is the higher- 
dimensional density on C = lJ ig /Pi coming from the densities on / and /i" 
on Bi, / : C — > B is a mapping that is injective for each fixed i (each Bi then 
can be regarded as a subset of P) and that fj," is the projection of // to B via 
/. Then for each A C P, the value fJ,"(A) in fact equals to the average of the 
function % n- fJ,"(Bi n A) with respect to /4. 

Important densities live on the sets of words [k] n . The n! permutations of 
[n] act on the coordinates of [k] n and produce a partition [fc] n = {J reI O r into 
orbits, or slices, where O r consists of the words that have equal numbers of 
occurrences of each letter j G [k] and / is the (™^^~ 1 ) -element set of fc-tuples 
r = (7*1, . . . ,Tk) G Nq, J2 r j — n ) recording these numbers. The equal-slices 
density v on [k] n is the unique density satisfying v{{x}) = v{{y}) if x,y G O r 
and ^(O r ) = i^(O s ) for any r, s G I. Explicitly, 

u({x}) = ' 



/ri+fc-l\ /• n \ 
V fc— 1 / Vri ,r2 ,...,r^J 

for x G [fc] n with occurrences of j. We reserve the letter v for the equal-slices 
densities and refer to the uniform and equal-slices densities as [i-density and v- 
density, respectively. If S C [fc]™ is a <i-dimensional subspace and Ac [k] n then 
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vs{A) is defined as v'(A') where A' C [k] d is the image of AO S in the bijection 
between S and [k] d and v' is the equal-slices density on [k] d (this in general differs 
from u(A n S)/v(S), whereas for ^-density both ways of relativising density to 
subspaces give the same result). 

A slice O r , r — (n, . . . , rj,), is degenerate if rj = for some j, and is non- 
degenerate else (then each letter j e [fc] occurs in the words of O r ); there are 
(lZi) non-degenerate slices. The non-degenerate equal-slices density v on [fc]™, 
for n > fc (else all slices are degenerate), is obtained from v by setting v(O r ) = 
for every degenerate slice and rescaling v accordingly (by the factor (1 — v{D))^ 1 
where D is the union of degenerate slices) on the union of non-degenerate slices. 
So 

HW) = pw n 7 

\k— 1/ \r 1 ,r 2 ,...,r k l 

if x G [fe] n has r 3 > 1 occurrences of j for each j e [k], and ^({x}) = if r } ; = 
for some j. 

For n, rf, fe € N and two words y e [<i]™, z € [fc] d , we define their composition 
y * z as the word a; € [fc]™ given by Xi — z yi , i = 1, 2, . . . ,n. Suppose that 
y G [rf]™ is non-degenerate (hence n > d). Clearly, {y * z | z € C [fc] n 

is the (i-dimensional subspace S(y'), where y' e [fc + rf]™ is obtained from y by 
replacing letter j with £;+ j (note that 5(2/') has no fixed coordinate), and in the 
factorization x — y* z the word z is uniquely determined by y; the equality x = 
y*z captures the way of determining x by selecting a subspace S(y') containing 
x and then selecting 'in' S(y') the word z corresponding to x. Note that if 
L = Liz 1 ) C [k] d , z' e [k + l] d , is a line, then {y * z | z e L} = L(y * z') C [fc] n 
is a line too. For n> d and M = [d] n x [k] d , we define a new density v' d on 
by 

(i/,z)eM, y*z=a; 

where i>i (resp. P2) is the non-degenerate equal-slices density on [d] n (resp. on 
[k] d ); we may clearly assume that in the sum y is non-degenerate. Below we show 
that v' d = v. Before that we demonstrate that by replacing the densities i>i in the 
definition of v' d with vi (and keeping y in the sum non-degenerate), we obtain 
a density v' d distinct from v. Indeed, for n = d = k = 2 and x = 11, the two 
factorizations 11 = 12*11 = 21*11 give i/ 2 ({x}) = 2(3( 1 2 1 ))- 1 (3( 2 2 Q ))- 1 = 1/9, 

butKM) = (3 ( 2 2 ))- 1 = l/3. 

Proposition 11 Let k,n € N and v be the equal-slices density and v the non- 
degenerate equal-slices density on [k] n . 

1. Ifm e N, j G [k] and Ac [k] n are the words with less than m occurrences 
of j, then v(A) < mk/n. 

2. Let A C [k] n , n > k, and D C [k] n be the union of degenerate orbits. 
Then (i) u(D) < k 2 /n, (ii) v{A) = (1 - v(D))^ 1 u(A) if A consists of 
non- degenerate words only, and (Hi) \v{A) — v{A)\ < k 2 /n for any A. 
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3. If n > d> k, the above defined density v' d on [k] n coincides with v. 



Proof. 1. We may set j = k. By the definition of v, v(A) equals to the ratio 

l-^l/Tfe-i 1 ) where M is the set of tuples (n, . . .,r fc ) e Ng, X)n = n, with 
rk = I < m. Thus 

m-l /n+fc-2-2\ m— 1 , /'7l+fc-2-(\ . 

/ 4 x _ v fc-2 j _ \ ^ fc-1 Ifc-gJ . mk 



/ n +fc-l\ n A- k 1 A»+fc-2\ 

=0 I ) 1=0 n + K 1 I J 



2. The bound on v(D) follows from part 1 with m = 1. The second claim is 
just the rescaling of v defining v. To show the last claim set A\ = AC\ ([k] n \D) 
and A 2 = An D. Then (by part 1 and (ii)) < v{A{) - v(A 1 ) = v{D)v{Ai) < 
v(D) < k 2 /n and < v(A 2 ) - v(A 2 ) = v{A 2 ) < v(D) < k 2 /n. Since A = A 1 U 
A 2 is a partition, subtraction of the two estimates gives \v(A) — 0{A)\ < k 2 /n. 

3. Let n > d> k, x G [k] n be a word, Xj C [n] for j G [k] be the positions of 
the letter j in x and rj = \Xj\. We assume that all rj > 1 because for degenerate 
x we clearly have v' d ({x}) = = v{{x\) (if y and z are non-degenerate then 
so is y * z). The factorizations x = y * z, with non-degenerate y G [d] n and 
z G 1-1 correspond to the pairs (P, Z) where P is a partition of [n] (a 
set of nonempty blocks) such that \P\ = d and if B G P then P C for 
some j, and Z : P — » [cZ] is a bijection. P and Z determine y and z uniquely 
(j/i = t -i==^> i G B G P with Z(P) = £ and 2j = j Z(P) = i for some 
B G P with P C Xj). We can generate the pairs (P, Z) also as follows. We take 

all Ze-tuples i = . . . , iu) G N fc with |i| = i\ H h ik = d, for each i take all fc 

ij-tuples s(j) G N lj , j = 1, 2, . . . , k, with |s(j)| = rj, then for each s(j) take all 

Qj)) = ~0F ^ here s(j ') ! = s 0')i ! --- s 0')*r) ordered partitions (Y jtl , . . . , Y jyij ) 
of Xj with \Yj, t \ = s(j) t , and finally we forget the orders of blocks Y ; . and label 
the d blocks in each resulting collection in d\ ways with 1,2, ... ,d. This way 
we produce each pair (P, Z) with multiplicity i\ = i\\ . . . (ij is the number of 
blocks of P contained in Xj). Thus 

, = v ^( S (1))U1))---(^))A! 

<ivi x /J 2^ Z"- 1N \Z ™ "\ Z d ~ 1N \Z dN \ 

||i||=fc,|i|=d,|| s (j)ll=^>l«(j)l=^ \d-l)\s(l)s(2)...s(k)) ' U-lAJ 

where || • || is the arity of a tuple, s(l)s(2) . . . s(k) means concatenation of the ij- 
tuples into one cZ-tuple and the denominator gives i>i({y}) -i> 2 ({z}) . By cancelling 
the common factors in the summand we simplify the sum to (r = (ri, . . . , rk)) 

{n - d)\(k - l)\(d - k)± ^2 i 



||i||=fc,|i|=«I,|| 8 y)ll=ii,l»Cj)l=ri 



(n-l)\0 

The last sum equals 

/ ri — 1\ (r 2 — 1\ /rfe — 1\ fn — k\ (n — k)\ 



E 



||=fe,|i|=d 



i\ — Ij \i 2 — \) \ik — 1 / \d—kj (d—k)\(n — d)\ 
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— we are counting (d— fc)-element subsets Y of an (n—k) -element set X according 
to the sizes of intersections of Y with blocks of a fixed partition of X into blocks 
with sizes rj — 1. Hence the sum equals V(fc-i) (™) = ^(i 2 -})- '-' 

To go from /^-density to ^-density, we show that if one weights [k] n on the 
minority of m coordinates uniformly and on the majority of remaining coordi- 
nates by equal-slices density, the resulting density is approximately ^-density. 
For k,m,n G N with m < n, we define a density p! m on [k] n by 

/4(W)= E mi(U})-M2(W)-^i(M) 

M3(J,x,y)=2 

where M consists of all triples (J,x,y) with J C [n], |J| = m, x € [fc] J and 
y G [fc][™l\ J , a triple is projected to [fc] n in the obvious way, \i\ (resp. /j,2 = M2,j) 
is the uniform density on the set of m-element subsets of [n] (resp. on [k] J ) and 
v\ = v\j is the equal-slices density on [fc]["]\ J . 

Proposition 12 Let k,m,n £ N, m < n and n' m be the above defined density 
on [k] n . Then for every set A C [k] n , 

Wm( A ) ~v( A )\ < km/n. 

Proof. We prove the inequality, in fact a stronger one, first for m = 1. Let 
z G [k] n and rj be the number of occurrences of the letter j G [k] in z. By the 
definition of /j,' m and v, 

Mi({ z }) /n + fc-l\/ n \ >^ rj/fcn 

/n+fc-lN fc /n+fc-l\ , 

I fc-1 J V- 1 < I fc-1 J = 1 . fc~ 1 

/n+fc-2\ jU — /n+fc-2\ 1 ' „ ' 

I fc-1 J j=l,rj>l I i 

So |Mi({^}) — ^({ z })l — ^rr v {{ z })- Summing over z e A and using triangle 
inequality we deduce that 

\^(A)-v{A)\<^v{A)<± 

We derive from this that \/j,' m (A) — fi' m _ 1 (A)\ < k/n for every A C [k] n and 
m > 2. The inequality |/i^(^4) — < mk/n then follows by induction and 

triangle inequality. Let m > 2 and Ac [fc]™. We partition the set of triples M 
defining fi' m _ 1 by the equivalence (J,x,y) ~ (J',x',y') iff J = J' and ir = a;'. 
So (projecting (J,x,y) to [fc] n when needed) 

/4-iW= E mi(U})-M2(W) E ^i(M)- 

BeM/~ B3{.J,x,y)eA 
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We replace in each inner sum the equal-slices density v\ on [fc][™l\ J (now \ J\ — 
m — 1) with the density fj,[, and this changes the total sum to, say, n"(A). 
Summing the changes of inner sums, the result for m = 1 gives that \/j,"(A) — 
fi' m _ 1 (A)\ < (k — l)/(n — m + l). A moment of reflection reveals that the change 
of v\ on each [fc]["l\ J to fj,[ gives an equivalent, only a more complicated, way 
of counting n' m (A) (it boils down to the identity (™) = n ~^ +1 ( m " 1 ))- Thus 
H"{A) = ljf m (A) and \n' m (A) - jj! m _ x (A)\ < (fc-l)/(n-m + l) < k/n as needed, 
because we may assume that n > km (else the result holds trivially). □ 

Propositions [TT] and [12] show that we may replace /i-density in Theorem [2] 
with ^-density. Using this we derive Proposition 1141 the key strengthening of 
Theorem H 

For k, m, n G N, J C [n] with |J| = to and y G [fe]["^ J , we denote by Sj tV 
the TO-dimensional subspace of [k] n that has J as the set of free coordinates and 
elsewhere is determined by y: x G Sj, y Xj = j/i for every i G [n]\J. 

Proposition 13 Let fc € N, k > 2, be given and assume Theorem^ for k. It 
follows that for every S > there is an no G N such that for every n > no every 
set A C [k] n with v(A) > <5 contains a line. 

Proof. Let 5 be given. We take the no of Theorem [2] corresponding to uniform 
density (5/3 and set m — no + fc. Suppose that n > 3km/S — 3fc(no + k)/S and 
that A C [fc] n has f/(A) > 5. By part 2 (hi) of Proposition HT1 v(A) > 25/3. 
By Proposition [T2l and the definition of density /i' m before it, there exists an 
m-dimensional subspace S — Sj tV of [k] n , J C [n] with \J\ = to, such that 
l^s(A) > v{A) — km/n > 6/3. By the choice of m and Theorem[2l there is a 
line in [k] n contained in A n S. □ 

Proposition 14 Let k € N, fc > 2, &e given and assume Theorem^ for k. It 
follows that for every S > t/iere exist an no € N and a 8 > swcti t/iat i/ 
n > no and Ac /ias f(j4) > 5, then the set M C [fc + 1]" of /mes contained 
in A has 

v(M) > 9. 

Proof. Let 6 be given. We take the no of Proposition [13] corresponding to the 
z/-density 5/2 and set d = no + 1. Suppose that n > d + 4fc 2 /<5 and A C [k] n has 
> 6. By part 2 (iii) of Proposition [TT] v(A) > 35/4. For y G [d] n we define 
C y = {z G [fc] d |f/*zei} (recall the composition of words * introduced before 
Proposition [TT]) . Let B c [d]™ be the set of (non-degenerate) words y such that 
v-i(Cy) > (5/2. By part 3 of Proposition [TTI (applied with k), 0(A) > 35/4 and 
the definition of £? imply that Oi(B) > 5/4. Deleting degenerate words (they 
are irrelevant for 2 anyway), we may assume that all words in every C y are 
non-degenerate. Consider the set 

M = {x 1 G [k + 1]" | x ' = y*z',ye B, z' G [k + l] d , L(z') C C v }. 
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These are lines contained in A: L(x') C A for every x' £ M . By the choice 
of d, for each y £ B the set C y C [k] d contains a line L(z') and (due to the 
purge on C y ) z' is non-degenerate. Let v' 2 (resp. v') be the non-degenerate 
equal-slices density on [k + l] d (resp. on [k + l]™ - )- Since vi(B) > <5/4 and for 
each y £ B there is at least one non-degenerate z' £ [k + l] d with y * z' £ M . 
giving contribution at least v' 2 ({z'}) > d~ k (k+l)~ d , by part 3 of Proposition ITT1 
(applied with k + 1) we see that 

By part 2 ((i) and (ii)) of Proposition [Til (and since n > 4fc 2 and k > 2), the 
desired lower bound v{M) > (1- (k+1) 2 /n)v'(M) > (S/9)d- k (k + l)- d = 6 > 
follows. □ 



To go from ^-density to /i-density, we show that if one weights [k] n on the 
minority of m coordinates by ^-density and on the majority of remaining coor- 
dinates uniformly, the resulting density is approximately /i-density. We prove 
it in greater generality with any density \j! on the minority of m coordinates. 
For k,m,n £ N with m < n and a density // on [fc] m , we define a density \j! m 
on [k] n by 

/4(W)= E MM)V(M)-MM) 

M3{cr,x,y)—z 

where M consists of all triples (a, x, y) with a : [to] — > [n] an injection, x £ [k] m 
and y £ [fc]N\ cr ([™]) j a triple (a,x,y) is projected to [k] n by setting z a ^ = Xi 
for i £ [to] and = j/j for i £ [n]\cr([m]), /ii (resp. /i2 = M2,<t) is the uniform 
density on the set of injections from [to] to [n] (resp. on [fcjMWI™])) and // is 
the given density on [k) m . 

Proposition 15 Let k,m,n £ N and r\ > be such that to < n 1 / 4 and n > 
(12fc/r/) 12 , /i' be a density on [k] m and jjb' m be the above corresponding density 
on [k] n . Then for every set A C [k] n , 

y m {A)-n{A)\< v . 



Proof. It suffices to consider only \J = \J U given, for some u £ [fc] m , by fi'({u}) = 
1 and fj,'({x}) — for x ^ u, because any density // on [k} m is a convex 
combination of these densities, // = ^u/J-'u [\i ^ Oj J2u ^« = an< ^ = 
J2u mi ^ ne general result follows by the triangle inequality. 

We fix words u £ [k] m and z £ [k] n such that z has between n/k — n 2 / 3 and 
n/k + n 2 / 3 occurrences of each letter j £ [k] (by Proposition |8l only very few z 
are not like this). If p (resp. q) is the minimum (resp. maximum) number of 
occurrences of a letter j in z (clearly p > to) then 

\ m / 
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because pi({a}) = l/n(n — 1) . . . (n — m + 1) lies between n~ m and (n — m)~ m , 
the number of a satisfying ii$ = z a u\ for i € [m] is at least (p — to + l) m and 
at most g m (<r determines a; and y) and A*2({y}) = k m ~ n . Since = fc~™, 

n/k — n 2 / 3 < p < q < n/k + n 2//3 and m < n 1 / 4 , we have 

(1 _ 2Jbfr v. r < (M*R < (i^?) m < (i+a^-vsr. 

Since 1 - <5 < e~ 5 < 1 - 5/2 and 1 + <5 < e 5 < 1 + 25 if 5 e (0, ±), we deduce 
that ^l^j- 1 lies in (1-4/cTOn -1 / 3 , 1 + 4fcm?i _1/3 ) provided that Akmn^ 1 ^ < \. 
This is true as 4fcmn~ 1/3 < 4fcn~ 1/12 < rj/3 < \. So 

|m^(H)/MW)-i|<V3- 

Let [fe] n = B U C, where B are the words meeting the condition on occur- 
rences of letters and C are the remaining words. By Proposition [H /i(C) < 
fe n -i/3 < ^/ 3 . since ^({z}) > (1 - ??/3)p(-j>}) for z € B, we have /Lt^(-B) > 
(1 - n/3)(jt(B) > (1 - ry/3) 2 > 1 - 2^/3 and aC(C) < 2r]/3. We conclude that 

< ^ |/4(W)-M({4)l + K„(^nc)-M^nc)| 

< E MW)fa/3) + V3 

□ 

We apply PropositionQj)]to three densities p! on [k] m , all invariant to permuting 
the m coordinates. The definition of fj,' m then simplifies, as one can put the a 
with the common m-element image J C [n] together and sum over the triples 
(J,x,y). The first application with fi'(A) = /j,b(A), where A c [k] m and B = 
[k — l] m , gives Proposition HI In the other two applications of Proposition [T5l 
n' is the equal-slices density on [k) m , respectively the density given by n'(A) = 
v'{Af] [k — l] m ) where v' is the equal-slices density on [k — l] rn , and we get the 
next proposition, for which we introduce the following notation. The truncation 
S' C S C [k] n of an TO-dimensional subspace S is obtained by forbidding k as 
the value of x 6 S on the free coordinates; S' 1-1 corresponds with [k — l] m . 
For A C [k] n wc define vs'{A) as v'(A') where A 1 is the image of A n S" in the 
bijection between 5" and [fe — l] m and v' is the equal-slices density on [k — l] m . 

Proposition 16 Let (5, 77 > and k,m,n g N satisfy to < n 1 / 4 ,™ > (12fc/r;) 12 
and A C [fc]™ oe a set ii/if/i = (5. Then the (uniform) averages of the 

functions S i-> and 5 i-> ^s<(A), over all subspaces S — Sj >y , J C [n], 

\J\ = to, and words y € [fc]!™^' 7 , are oot/i at Zeast 5 — rj. 

To deduce Proposition [51 we need Propositions [Til H21 EH an d [HI 
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2.3 Derivation of Proposition [5] 

In this subsection wefixa£;GN,/c>3, assume that Theorem [5] holds for k — 1 
(and every 5 > 0) and deduce from this Proposition [5] for k. We proceed in 
three steps. First we show that for any positively //-dense Ac [k] n there is a 
subspace S C [k] n such that either A gets on S ^-denser (which gives the desired 
density increment at once), or A gets positively j/-dense on the truncation S' of 
S (recall that 5" has forbidden k on the free coordinates) while losing not too 
much i/-density on the whole S. In the crucial second step we obtain, assuming 
the second alternative and that A is free of lines, a z/-density increment of A 
on a k-set D, an increment large enough to make up for the previous loss. In 
the third step we convert the j/-density increment of A on D to a //-density 
increment. 

Proposition 17 Let 6, r\ > and m,n G N satisfy r] < 5 /A, m < n 1 / 4 , n > 
(12fc/r/) 12 . Then for every set A C [k] n with /J,(A) = 5 there exists an Tri- 
dimensional subspace S C [k] n such that 1 or 2 holds: 

1. v s {A) >S + T 1 = f i(A)+r ] ; 

2- v s {A) > 5- ArjS- 1 = fjt(A) - irjd- 1 and u s >(A) > 6/4, where S' C S is 
the truncation of S with values on the free coordinates lying in [k — 1] . 

Proof. We take uniformly the subspaces S — Sj jV) as described in Proposi- 
tion [TO] Let M be the set of S with v s {A) < S - 4n/S and N be the set of S 
with us' (A) < (5/4. We assume that 1 does not hold, so fs(A) < S + i] for every 
S, and show that then 2 holds. If n(M) > 8/2 then the average of vs{A) over 
S is at most 

(1 - 5/2)(5 + rj) + {6/2)(6 - 4r]/S) = 5 + (1 - 8/2)n ~2r]<S-r], 

contradicting Proposition[T5] So n(M) < 5/2. Similarly, if n(N) > 1 — 6/2 then 
the average of Vs> {A) over S is at most 

5/2 + (1 - 5/2){5/A) < 3(5/4 < 5 - n, 

again contradicting Proposition [1^1 So fi(N) < 1 — 5/2. Hence there is a 
subspace S — Sj^ v not in M U N and 2 holds. □ 

For x £ [k] m (we have replaced n by m to indicate that we move into S) and 
j G [k — 1], we denote, as before, by x(j) the word obtained from x by changing 
all fcs to js. For a set A\ c [fc] m and j G [fc — 1], we define 

fc-i 

Gj [fc] m | G Ax} and C = f] C 3 G [k] m . 

j'=l 

Note that each Cj is jfc-insensitive and that, crucially, if Ai contains no line 
then A\ n C C [fc — l] m . Indeed, if x G n C had an occurrence of k, then 
{x} U {x(j) | j G [fc — 1]} would be a line in [fc] m contained in A\. 
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Proposition 18 For every 5\ > there is an too G N and a 6 > suc/i i/iai 
the following holds. If to > too and Ai C [A:] m contains no line, v{Ax) > Si and 
(measured in [k - l] m ) v{A x n [k - 1]'") > S 1 /4, then there is a k-set D C [k] m 
satisfying 

v{A x nD)> v{Ai)v{D) + 8x6 /2k > 8xv{D) + 8x6 /2k. 

Proof. Let Sx be given. Applying Theorem [2] for k — 1 and Proposition I14| we 
take an too and a 6 > such that if to > too then for every set B C [k — l] m 
with v(B) > Sx/4 the set M C [fc] m \[fc - l]' m of lines contained in B has 
v(M) > 6; we also assume Too so big that to > mo implies k/dm < 8x/2. Now 
let to > to and Ax C [k] m be as stated, with the above defined sets Cj and C; 
for convenience we denote <5i = v(Ax). By the assumptions we may take as B 
the set B = Ax<~)[k — l] m . The lines M in B then 1-1 correspond to the words 
in C\[k - l] m . Hence v(C\[k - l] m ) > (9. We observed above that C\[k - l] m 
is disjoint to Ax. Therefore using part 1 of Proposition [TT] we get 

v{Ax n C) < k/m < 68x/2 < {8x/2)v{C). 

For j e [Jfe] we set = C x H •• • H C^i n ([fe] m \Cj); £> (1) = [fc] m \C! and 
D (fe) = c Thug [ fc jm = yfc =i £,y) ig a partition . B y = ^ and n 

A fc )) < (5i/2)^(L>( fc )), 

v{Ax n U • • • U D^ -1 ))) > <y x - (<J 1 /2)i/(£)W) 

= 5 1 (l-K^ (fc) )) + ('5i/2)^(^ (fc) ) 
> < 5i^(L> (1) U---UL> (fe " 1) )+^i6'/2. 

Thus n DW) > + 8x8/2(k - 1) for some j £ [k - 1]. We set, 

for this A = Cj for i < j, A = [k} m \C 3 and A = [k] m for j > j. Clearly, 
each A is zfc-insensitive. The k-set D — rL=i A = satisfies the displayed 
inequality. □ 

This is the heart of the proof of Theorem [2] transmuting the inductive assump- 
tion on the level k — 1 in a density increment on the level k. The quantities 
mo = too(<5i) and 6 = 6{8\) come from the validity of Theorem [2] for k — 1. In 
particular, note that 6 can be assumed nondecreasing in 8x (it is obvious from 
the proof but perhaps is not so clear from the statement). 

Proposition 19 Let f3, 8 2 G (0, 1), m, r G N, > kr/m and let A 2 C D C [fc] m 
be sets satisfying u(A2) > 82v{D) + 3/3. Then there exists a subspace V C [k] m 
with dimension r such that 

M^O > fcM-D) + P. 

Proof. The average of ^y(A 2 ) — 82^v{D) over all subspaces V — Sj :V , with 
J C [to], |J| = r, taken uniformly and y G [fc][ m ]\ J taken according to v- 
density, equals \jL r {Ai) — 8%ijf r {D) where fi' r is the density on [k] m introduced 
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before Proposition [TJ] By Proposition [T2] and the assumptions this is at least 
(v(A 2 ) - P) - 5 2 {u{D) + /3) = v(A 2 ) - 5 2 v{D) - 2/3 > /3. Thus a subspace 
V = Sj,y exists that satisfies the displayed inequality. □ 

We prove Proposition [SJ Let r £ N and 5 > be given (k > 3 is fixed) and 
suppose that A C [k] n contains no line, fJ.(A) > 5 and n > n ; we specify a bound 
on no at the end. We set Si = 5/2 and take the m = m (Si) and 9 = 9(5\) 
of Proposition HI Let n = 5 2 9/32k and m = |n 1/4 J ■ Note that n < 5/4, 
5 — 4?7(5 1 > 5i, m < n 1 / 4 and 5/4 > 5\/4. By Proposition \TT\ applied for 
5 = [J.(A), rj, to and A, if n > (12fc / 77) 12 then there is an m-dimensional subspace 
S* C [k] n satisfying alternative 1 or alternative 2. Wc denote by A\ c [fc] m the 
image of An S in the bijection between S and [k] m . We first consider alternative 
1. So ^(^4i) > m(^4) + For n large enough so that r;m/3fc > r, Proposition [T9l 
applied for /3 = 77/3, (52 = /j(A), fn,r, A 2 — A\ and D — [k] m , provides an 
r-dimensional subspace V C [k] m on which u v (Ai) > u(A) +i]/3. We achieved 
a /i-density increment of A on the fc-set D = W in the r-dimensional subspace 
VF C [k] n that is the image of V in the bijection between [k] m and S 1 , with the 
increment 7 = 77/3. Clearly, u\y(D) = 1 > 7. 

Let 5* C [k] n satisfy alternative 2 of Proposition [T7J So ^(^4i) > n(A) — 
4rj/a(A) > 5i and v(A x n [fc - l] m ) > (5/4 > <5 x /4. By Proposition HH applied 
for (5i and Ai, for large enough n (so that to > mo) there is a fc-set D\ C [fc]" 1 
for which 

v{A 1 C\D l ) > is(A 1 )is(D 1 ) + 5 1 9/2k>ii(A)v(D 1 )-4r ] 5- 1 +S 1 9/2k 
> n(A)v(D 1 ) + 56/8k. 

We apply Proposition [T9l with j3 = 59 /24k, 5 2 = fJ,(A), m,r, A 2 = A x n L>i 
and D\. For large enough n (so that (3 > kr/m) it provides an r-dimensional 
subspace V C [fc] m with /-ty(A 2 ) > fj,(A)fi v (D 1 )+fi. Note that DiRF is a fc-set 
in V. We achieved a /^-density increment of A on the k-set D = c(D\ n F) in 
the r-dimensional subspace W = c(l^) C [k} n , where c is the bijection between 
[k] m and S, with the increment 7 = [3 = 59/24k. Clearly, u w (D) > f3 = 7. 

To summarize and integrate both cases, we see that for given r £ N, 5 £ (0, 1) 
and any n > uq, for any set Ac [k] n containing no line and with fJ-(A) > 5 
there is an r-dimensional subspace W C [k] n and a fc-set D C W in W such 
that u w (D) > 7 and /in/(4nfl) > u(A)/i W (D) + 7 (hence /ir>(A) > n(A) +7), 
with the desired density increment 7 = min(n/3, 0) = ry/3 = 5 2 9/96k. We 
observed above that is nondecreasing in (5i =5/2 and so 7 is nondecreasing in 
5. Finally, the argument shows that the sufficient no to take is, for 77 = 5 2 9/32k, 

n = \(l2k/n) 12 + (3fcr/n) 4 + to 4 + [24k 2 r / 59) Al \ 

where mo = mo((5/2) and 9 = 9(5/2) are the quantities of Proposition HH guar- 
anteed by Theorem [5] for k — 1. This concludes the derivation of Proposition [SJ 
The proof of Theorem [5J the density Hales- Jewett theorem, and conse- 
quently of Theorem [T] Szemeredi's theorem, is complete. 
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3 Concluding remarks and thoughts 



In writing this text we were motivated also by the last sentence of the abstract 
in |20j : "Our proof is surprisingly simple: indeed, it gives arguably the simplest 
known proof of Szemeredi's theorem." How simple/long is then Polymath's 
proof of Szemeredi's theorem? The article [50] has 44 pages but the proof of 
Theorem [5] only starts after 32 pages in Section 7 and takes about 8 pages, 
during which it draws on various results and concepts obtained in the preceding 
part. The original article of Szemeredi [22] has 46 pages and Furstenberg's er- 
godic paper [5] 52. In the book of Moreno and Wagstaff, Jr. [17, Chapter 7], one 
of the few (if not the only one) monographs or textbooks presenting Szemeredi's 
combinatorial proof of his theorem, the proof takes 38 pages, and in the write-up 
of Tao 23 about 26. An article of Tao [23] of 49 pages gives a proof of Sze- 
meredi's theorem based on a combination of ergodic methods and the approach 
of Gowers [8]. Towsner [25] gives a (not quite self-contained) model-theoretic 
proof of Szemeredi's theorem on 10 pages. (This list of proofs of Theorem [T] in 
the literature or on the Internet is far from exhaustive.) Our present write-up, 
a reshuffled and pruned form of Polymath's proof [5D] , demonstrates that it is 
possible to write down a self-contained combinatorial proof of Szemeredi's the- 
orem well under 20 pages, which justifies the quoted sentence. Of course, it is 
even a proof of a stronger theorem, the density Hales- Jewett theorem. 

As for the correctness of the proof in [20], we pointed in the remark after 
the proof of Proposition [TU] a probably overlooked lower bound factor in [2TH 
Lemma 8.1], but this is trivial to repair (which we did) and we did not notice in 
[20] anything more serious than that. In recent years formal proofs of various 
popular theorems were worked out, for example, for the Prime Number Theorem 
(Avigad et al. 3 , Harrison [H]), Dirichlet's theorem on primes in arithmetic 
progression (Harrison [T3]) or Jordan's curve theorem (Hales [TT]). Szemeredi's 
theorem is known for logical intricacy of its proof — an interesting project in 
formal proofs may be to produce a formal version for it or, for this matter, for 
the proof of the density Hales- Jewett theorem. 

Many arguments of the proof in [20j as we present them are simple instances 
of the probabilistic method reasoning (see Alon and Spencer [2]), but we evade 
words 'probability', 'random' or 'randomly' in our write-up (in [20] the last two 
words appear more than 90 times). We prefer the terminology of densities in- 
stead, to emphasize that we give in all cases explicit definitions and constructions 
of the densities (i.e., probability measures) used, which is not quite done in [20] . 
We consider it important, for the sake of rigorousness of the whole approach, to 
give these explicit definitions. For illustration consider the identity in part 3 of 
Proposition 111! for which we gave a verificational proof. The original proof of 
Polymath [20] pp. 1297-8], more elegant, is free of calculations and is based (in 
our terminology) on representing the non-degenerate equal-slices density v on 
[k] n , k < n, as a projection of a higher-dimensional density built from uniform 
densities. Informally ([20l p. 1295]): a i>-random word x arises by selecting n 
points q±, . . . , q n around a circle in a random order, putting randomly k delim- 
iters n, . . . , Tfc in some k distinct gaps out of the n gaps determined by the n 
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points qi , and then reading the positions of the letter j £ [k] in x in the indices 
i of the points qi lying between rj and the delimiter clockwisely preceding rj. 
Formally (not an exact translation): for x € [k] n , let 

m'(W)= E mi(W)-m 2 ({b'}) 

(7T,S')=X 

where 7r run through the n! permutations of [n], B' run through the k\Z) pointed 
fc-element subsets of [n] (the pairs (B,b) with b E B C [n], |£?| = fc), /ij are 
uniform densities and (n, B') projects to [k] n as follows. If ir = aia 2 ■ ■ ■ a n and 
B' = (B,b) with B = {bi < b 2 < ■ ■ ■ < b k } and b = b t , we project (n,B') to 
x £ [k] n by setting, for j ~ 1,2,..., k, x ai — j exactly for the terms a; in tt 
with i in the interval bt+j-i <i< bt+j, where the indices are taken modulo k 
and the interval bk < i < bk+i = b\ is [fefe,ri] U [1, 61). It is immediate to show 
that /i' = v. 

In conclusion, we want to remark that the use of non-uniform densities v and 
v on words and their interplay with the uniform density is a really interesting 
and combinatorially beautiful feature of Polymath's proof [20 . 
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